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Description 

[0001 ] The invention relates to optimisation of feedfonward neural networks. 

[0002] A neural (or connectionlst) network may comprise a structure of interconnected hardware devices, or it may 
5 alternatively be implemented in software by which a serial or parallel computer is instructed to carry out neural network 
processing. Both implementations are generally refen-ed to as neural networks. In general, the motivation behind con- 
struction of neural networks has been to emulate biological neural systems for solution of problems. Even on a super- 
ficial level, there are several distinctive features about the way in which biological systems process information. For 
example: 

10 

• They use a large number of individual processing units. 

Each processing unit does not store a large quantity of information, and performs a relatively simple computational 
task. One simple model of such a processing unit simply involves computing a weighted summation of incoming 
15 data and outputting this to other units. 

There is a high degree of interconnectivity among the processing units. 

[0003] Systems with these features are often termed massively parallel. 

so [0004] This invention, relates to feedforward networks, which are possibly the most commonly used neural network. 
Such a network consists of a number of input units, one or more output units, and possibly one pr more layers of hidden 
units (sometimes referred to as inner neurons) between the input and output units. The input units are connected to the 
hidden units, which in turn are connected to each output unit. Information is passed from layer to layer by means of 
these links, each of which has a weight value. Adjustment of the connection weights of the network causes it to process 

25 information differently The name "feedforward" derives from the fact that information passes through the connections 
in one direction only 

[0005] To construct a network which performs some useful function, it is necessary to adjust the weights to appro- 
priate values. This process is referred to as supervised "learning" or "training". In one example, training involves 
repeated transmission of stimuli for presentation of examples from the problem in question and adjusting the connec- 

30 tions to reduce the overall error of the network examples. There are various processes by which training may be earned 
out, including one refen-ed to as stochastic (for example global optimisation), and deterministic (for example backprop- 
agation). Global optimisation is described in the paper Baba, N., "A New Approach for Finding the Global Minimum of 
Error Function of Neural Networks', Neural Networks, Vol 2, pp 367-374, 1989, and backpropagation is described in 
Rumelhart, D. and McClelland, J. "Parallel Disti-ibuted Processing: Explorations in the Microstructure of Cognition", Vols 

35 1 and 2, M. I.T. press, 1 986. In the backpropagation training process of a neural network having many inputs, a single 
output, and a single hidden layer, examples have been repeatedly presented to the netvyork and the weight values have 
been adjusted so as to reduce some error measuring the difference between the network output and the desired output. 
The usual error measure is the mean squared error. 

[0006] Feedfonward neural networks have been used in many applications, botti using discrete and continuous 
40, data. In the discrete case, ttiey have been applied to many problems such as pattern recognition/classification which 
may involve, for example the grading of apples from photographic images, texture analysis of paper quality, or automatic 
recognitfon of hand bitten postal codes. Application areas where continuous valued data is used include time .series 
prediction and nonlinear regression. One of the most useful aspects of neural netiworks in these applicati'ons is ttie 
potential of a suitably tiained network to generalise, i.e. to achieve good performance on new examples not used in 
45 training of the network. For example, in applying neural networks to recognition of hand-written postal code digits, ti^ain- 
ing is performed using handwriting from a small group of people. It would be useful for postcode recognition if the net- 
work also performs well on handwriting samples other flian those not used in training of tiie network. 
[0007] However tiiere are difficulties in applying neural networks to specific applications. These problems can be 
broadly stated as: 

50 

• Structural Difficulties: What is flie most suitable network structure for the problem is question? The nature of the 
problem itself often (but not always) suggests an appropriate number of input and output units to use. However the 
number of internal or hidden units to use is difficult to determine. Using too few can result in a network not having 
enough internal structure for the application in question. Using too many reduces the power of the network to gen- 

55 eralise. 

Learning Difficulties: Training a neural network is a slow process often requiring tiiousands of iterations for the error 
to be reduced to an acceptable level. This is true no matter what process is used to train tiie network (though some 
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are faster than others), 

[0008] F»CT Patent Specif ication No. WO 91/02322 (Hughes Aircraft Company) discloses a process for training a 
neural network which involves presenting a desired output to hidden units (inner neurons) as well as to output neurons. 

5 In the specification there is no disclosure of a method of optimising the structure (i.e. the number of hidden units 
required) either during or after training. Where the structure is not optimised, there is a tendency to start with using con- 
siderably more hidden units than is necessary to ensure that there are sufficient to carry out the tasks required. This 
results in a considerably longer time requirement for training of the network and also in operation of the network on an 
on-going basis. Similarly US Patent Specification No. 5.056.037 (NASA) discloses a feedfonward neural network and a 

10 method of training it. Again, there is no disclosure of a method or process for optimisation of the structure. 

[0009] The paper "A geometrical interpretation of hidden layer units in feedfonvard neural networks" by John Mitch- 
ell and published in 'Network: Computation in Neural Systenre" Vol 3 pp 19-25 February 1992 discloses a process for 
geometrical interpretation of hidden units in a trained feedftjnward neural network. Such a geometrical interpretation is 
of limited value in some circumstances because the global effect of hidden units prevents deletion of them after training 

15 of the network and thus optimisation of the neural network is difficult to achieve. 

[001 0] Unfortunately feedfonvard networks which use linear processing units are severely limited in the tasks that 
can be performed. An example is the apparently straightfbnward classification task of the XOR (exclusive or) problem. 
Given two input logical (i.e. 0 or 1 ) values, the requirement is that the network output a 1 if exactly one of the input val- 
ues is 1 . It can be shown that a network with linear units, no matter how many and no matter what the layout, cannot 

20 perform this task. Hence it is important that networks use nonlinear processing units. Direct and unique solutions for the 
. parameters can not be found. To find solutions successive apprcKimations to gradually improve parameter estimates 
must be used. 

[OOfi] Learning processes or methods to perform this task have the following common features: 

25 (1) The success of the process is gauged by examining an overall static indicator, or set of static indicators, giving 
information about how the network is performing. The usual, though not exclusive, static indicator used is the mean 
squared error of the network of the training data. The methods are collectively called supervised learning tech- 
niques. 

30 (2) The process involves repetitions, where in each cycle the training data is presented to the network and the 
parameters are adjusted. This is done until the overall network learning criterion in (1) above is satisfactory. 

[0012] The actual leaming processes themselves may be classified as gradient based or perturbation based. In 
Gradient-based leaming processes, the inforoiation to adjust the weight values of the network is determined from the 

35 gradient of the error measure with respect to the various piarameters. 

[001 3] Perturbation processes use a different approach. They involve making a small change in the parameters of 
the network, using a random perturbation. Then the resulting network is checked, and if it is better, the parameters of 
the network are kept. If not, the change is undone and a new one tried. An example of a stochastic perturbation process 
is the "global optimisation" process. 

40 [001 4] A further distinction can be made witiiin tiiese learning processes. Either tiie weights can be adjusted each 
time a training point is presented to the network (the adjustment being one of the types defined above), or the adjust- 
ments can be done after a complete presentation of tile training points. The former is termed on-line learning, and the 
latter termed off-line learning. Not all adjustments can be done in both ways. For example, backgpropagation can be 
done on-line or off-line, but the BFGS leaming method is usually done off-lirie, since it being faster than backpropaga- 

45 tion is only assured in this case. 

[001 5] From the basic classifications of learning processes above, there have been many modifications that have 
been claimed to improve the performance. 

[0016] Because of ttie difficult nature of tiie parameter estimation problem, most extensions to the learning proc- 
esses are based on heuristics as to what to do next during learning. These heuristics woujd be postulated based on, 

50 for ecample, some observations from repeated simulations in tiie past. One very simple such heuristic is adding a 
momentum term to backpropagation. In this case as well as tiie current gradient information, one also adds a certain 
proportion of the previous gradient. This is based on the simple ob»servation about the shape of the en-or surface O-e., 
a visualisation of the error measure). If the weight update given by the basic propagation process is causing the en-or 
measure to decrease, then this can be speeded up by also adding the previous weight update. This simple heuristic has 

55 been adopted in some cases, although its usefulness is disputed, as in Fogelman F "Neural Network Architectures and 
Algorithms: A Perspective", pp. 605-615, Proceedings of the ICANN-91 Conference, Helsinki, Finland, Elsevier 
[001 7] In general, therefore, there is a lack of consensus about what are tiie best learning processes to use. This 
is because the nature of learning and convergence is highly complex, witii litHe theory about the shape of error surfaces 
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when there are large numbers of parameters. The actual convergence properties of the system are highly dependant 
on the nature of the problem, with simpler learning tasks (lil« the XOR problem) having different convergence charac- 
teristics. 

[0018] It Is known that each unit can be assigned a unique identification and can then during the training method 
5 be dynamically interpreted. This is done by interpretation of the transfer function of each individually identified hidden 
unit or neuron. It is also known, for example, to interrupt the operation having regard to the dynamic interpretation of the 
network performance and specifically the dynamic interpretation of the performance of an individual unit which can be 
used to provide some form of indicator of the performance of that individual neuron or unit or of the network. This then 
allows the training process to be interrupted. Then it is known to modify the network and then to continue the simulation. 
10 In this way it is known, for example, to interrupt the training and change the network and then continue the training proc- 
ess and see whether this improves the performance of the network. This is clearly described in a paper by J. Nijuis et 
al "A General-purpose neural network simulator", (Microprocessing and Microprogramming, vol. 27, No. 1/5, August 
1989. Amsterdam Netherlands - pages 189-194). 

[0019] A somewhat similar system is described in European Patent Specification No. 0 360 674. 
IS [0020] The invention is directed towards providing a method and apparatus to provide for optimising a feedforward 
neural network and also to improving efficiency of training. 

[0021 ] According to the invention, there is provided a training method for a feedfonvard neural network having hid- 
den units comprising the steps of transmitting input stimuli to the network and adjusting connection weights in response 
to monitoring the network output signals, for that training method, the steps including dynamically interpreting the net- 
20 work performance during training by interpretation of the transfer function of an individual hidden unit using the imme- 
diately connecting weights of tiiat hidden unit; continuously generating a dynamte indicator of that network 
performance; comparing the dynamic indicator to a desired dynamic indicator of performance; and interrupting the 
training mettiod when tiie dynamic Indicator falls below the desired dynamic indicator of performance; characterised in 
that the mettiod comprises the additional steps when ti-aining is inten-upted of: 

25 

generating a static indicator of the hidden units performance by carrying out a static interpretation of the overall net- 
work performance witti and without ttie hidden unit; and 

altering the network internal sh-uchirein response to the static interpretation of the performance of the hidden unit. 

[0022] In one embodiment, trahsfer functions of hidden units are analysed during dynamic interpretation, prefera- 
bly, these are continuous and monotonic ti-ansfer functions having asymptotically zero derivatives are interpreted. 
[0023] In one embodiment of tiie method and apparatus according to the invention, tfie dynamic indicator is geo- 
metrical and is displayed. 

35 [0024] Ideally, both dynamic and static Interpretation involve relating operation of a hidden unit to the input data. 
[0025] In another embodiment, static lnterpretation is at a macroscopic level, whereby global information relating to 
network performance is interpreted. 

[0026] According to another embodiment, the invention provides a training method for a feedfonvard neural network 
comprising the further steps of: 

40 

Storing Characteristics Of different training method; 

selecting an initial training process and can-ying out training according to the method; 

4S dynamically monitoring a feature of the training method; 

evaluating the monitored feature according to a contixil condition; and 

selecting a different ti^ining method for subsequent training stages according to tiie control condition. 

[0027] In this latter embodiment, a plurality of control conditions are used for evaluating nronitored features, a con- 
trol condition being specific to a training method, and another conti-ol condition being specific to all methods. ^ 
[0028] According to a further embodiment, the invention provides an apparatus for a training method of a feedfor- 
ward neural network, the apparatus comprising means for transmitting stimuli to the network and means for adjusting 
55 connection weights in response to monitoring the network output signals, means for dynamically interpreting perform- 
ance of the network during training by interpretation of the tiansfer function of an individual hidden unit using the imme- 
diately connecting weights of that hidden unit; means for continuously generating a dynamic indicator of the network 
perfomiance; means for comparing the dynamic indicator to a desired dynamic Indicator of performance; and means 
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for interrupting thie training method wlien ttie dynamic indicator fells below the desired dynamic indicator of perform- 
ance, characterised in that the apparatus further comprises: 

communication means between the dynamic indicator and a static interpreter; 

means in the static Interpreter for generating a static indicator of the hidden units performance by carrying out a 
static Interpretation of the overall network performance with and without the hidden unit when the training Is Inter- 
rupted; and 

means for altering the networl< structure in response to the static interpretation of the performance of the hidden 
unit. 

[0029] According to a still further aspect, the invention provides an apparatus for implementing a training mettiod of 
a feedfbnvard neural network including: 

a plurality of training methods; and 

the apparatus further comprises a training controller comprising: 

means for storing characteristics of the training methods; 

means for initiating training of tiie network according to a selected training method; 

means for dynamically monitoring a training featjre; 

means for evaluating the monitored feature according to a comrol condition; and 

means for selecting a different training method or fbi- subsequent training stages according to the control condition. 

[0030] The mettiod and apparatus of tiie invention will be more clearly understood from the following description of 
some preferred embodiments tiiereof, given by way of example only witii reference to the accompanying drawings In 
which:- 

Flg. 1 Is a diagrammatic view showing a training method of the invention which Involves optimising a feedfonward 
neural network structure; 

Fig. 2 is a diagram showing a simple feedfbnvard neural network and operation of a hidden unit; 

Figs. 3(a) and 3(b} are diagrams showing tiie interactive and static interpretative steps of tiie metiiod in more detail; 

Fig. 4 Is a diagram showing a portion of tiie internal structure of a feedforward neural network, and in particular a 
hidden unit for which the Inputs are two-dimensional; 

Fig. 5 is a diagram showing a typical geometrical primitive generated by dynamic interpretation; 

Fig. 6 is a diagram showing an alternative training method of the invention; 

Fig. 7 Is a set of graphical plots showing how it operates; and . 

Fig. 8 Is a diagram showing the training method of Fig. 6 in more detail. 

[0031] Referring to the drawings, and Initially to Fig. 1, there is illusti-ated a training process or method which 
Involves optimising the structure of a feedfonward neural network which has hidden units by determination of the opti- 
mum number of hidden units. In more detail, the network has many inputs, a single output and a single layer of hidden 
units. This could be used for solution of real valued approidmation problems such as nonlinear regression applications. 
The method is indicated by the reference numerals 1 to 5. inclusive. In Fig. 2, a typical feedfonward neural network 6 is 
shown. 

[0032] There are tiiree input units 7, five hidden units 8. and a single output unit 9. 
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[0033] Before describing the invention the motivation behind it is further clarified by looldng at the neural network 
learning process in more detail. The procedure involved In training a neural network using the backpropagation learning 
method is as follows. The architecture being considered here is the many input, single output, single hidden layer neural 
network. This is to be used on real-valued approximation problems. 
s [0034] Having a set of l\^ examples of the (real valued) problem, say 

{x.i,yi),{itt,yi)...,{xn,yn) (1) 



10 

[0035] For each of these, in trainirig the input to the network is the vector Xj, and the desired output of the network 
is the scalar yi. This set of examples is repeatedly presented to the networK and the connecting weight values adjusted 
so as to reduce some error me^ure measuring the difference between the network output on an example (say N) and 
the desired output of the network y|. The usual error measure used in this case is the mean square enxir: 

15 

MSE^tiyi-m\.Q)y (2) 
i-1 

20 

[0036] Here the output of the network N depends on the Input vector Xj and the adjustable parameters of the net- 
work e. The values of these parameters are adjusted by training the network using some form of iterative process. In 
most simulation environments this information is the sole guide used in training. Hence the decision that training is com- 
S5 plete is taken on the basis of the value of the MSE. Clearly this variable is a macroscopic performance measure which 
does not directly take Into account the internal structure of the network. There are several reasons why more detailed 
information is not used. For example: 

(a) It is not clear what information should be monitored. The relationship between individual connection values and 
30 theoverallperformanceofthenetwork. for example, is not Clear. 

(b) The computational power requirement for obtaining useful information is prohibitive. 

[0037] Hence information which would be useful in creating a better network structure during training Is difficult to 
35 Obtain and Interpret. 

[0038] In the Invention, useful geometrical Information is generated during the simulation process. This requires lit- 
tle processing power as It can be obtained In an efficient way by using the connection strengths of the network. 
[0039] Referring to Fig. 1, the apparatus of the Invention comprises the following components:- 

40 ■ A neural network simulator core or processor 15 - this is a nultilayerfeedfonward neural networksimulator. Within 
this component are the necessary circuits to implement a multilayer feedfbrward neural networK and methods to 
be used to train the network. 

- A dynamic interpreter 16 - this portion of the apparatus interprets the internal state of the network during the slm- 
45 ulatlon. This Is done using geometrical means, i.e. using the geometrical shape features of the hidden units. 

- A static interpreter 1 7. Acting on Information from the geometrical interpreter, it is possible to investigate the inter- 
nal structure of the network in more detail by interrupting the training process. 

50 [0040] In step 1 of the method, training of the neural network is earned out by the neural network processor 15. 
Training may be carried out by any suitable iterative process, such sis backpropagation. 

[0041 ] In step 2 of the method, dynamic interpretation of the structure of the neural network is carried out during 
training. Briefly, dynamic interpretation of the network at the microscopic level is carried out. This involves interpretation 
of the transfer function of a hidden unit using only the immediately connecting weights of that hidden unit. Thus, the 
55 processing requirements for such dynamic interpretation are relatively small. Step 2 also Involves displaying perform- 
ance indicators generated during dynamic interpretation. Dynamic interpretation is described In more detail below. 
[0042] In step 3 of the method, user interaction takes place by analysis of the displayed indicators. In this embodi- 
ment, the indicators are geometrical and a sample is shown in Fig. 3(a). A user input device such as a teyboard or 
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"mouse" is used for selection of a particular display, arxl activation of the static interpreter 1 7. TTie static interpreter 1 7 
communicates with the dynamic interpreter 16 and witti ttie neural network processor 1 5 to inten-upt ti^ining of tiie net- 
work and can-y out a detailed analysis of the resultant static neural network at a macroscopic level. 
[0043] Step 4 involves display of Indicators representing the sti-ucture of the neural network in considerably more 
s detail than that generated by the dynamic interpreter 1 6. Such a display is shown In Fig. 3(b). In this example, the max- 
imum error of tiie network is shown before and after the removal of a hidden unit, with a symbol 1 9 indicating the rele- 
vance. Thus, a user may see at a glance the relevance of a particular hidden unit both in detail, and by a simple 
indicator. 

[0044] In step 5, a decision is made as to whether or not the particular unit is to be removed from the network. If so, 
10 the neural network processor removes the unit, training resumes, and steps 1 to 5 are repeated. Such a decision may 
alternatively be automatically made by the processor 1 5 without the need for static interpretation, if tiie geomeb-ical indi- 
cator from dynamic interpretation is clear enough. 

[0045] In step 2 interpretation of a hidden unit does not require information relating to otiier hidden units of tiie net- 
work. In practice, this information generally conrprises weights connected direcfly to the hidden unit and boundaries of 
is the data. Accordingly, there is a relatively small processing requirement of the dynamic interpreter and training of a net- 
work is not slowed down to any great extent. 

[0046] A feedfonvard network with a single hidden layer of units and a single output unit performs the following type 
of ti-ansformation from input to output: 

20 N K 

iV(x)=V'(2v.?>f(5;w^Jy+A)+T) (3) 

•=1 >=1 



[0047] Where there are K input units, connected witti weights Wjj to N hidden units, which in turn are connected 
with weights v;- to the single output unit. The structure also contains bias terms p,- and x. which can be interpreted as 
weight values to the hidden or output unit from a processing unit which always outputs 1. TTiey are intended to give the 
30 structure more flexibility in training. The power of this structure comes from the feet that the units wittiin it process infor- 
mation in a non-linear feshion. Hence each of ttie hidden units has a transfer function ip,-. and ttie output unit has a trans- 
fer function v- These transfer functions are usually chosen to be identical arid tiie most commori is ttie slgmoidal 
transfer furiction: 



which is monotonically increasing and has range between 0 and 1 . This was originally proposed from biological consid- 
40 erations. 

[0048] The output unit is assumed to have a linear or a piecewise linear ti-ansfer function. However, because the 
object is to assess the relative conta-ibution of the hidden units to the approximation, the process also involves use of 
nonlinear tiansfer functions provided they are continuous and monotonic. (such as the sigmoid function). 
[0049] Dynamic interpretation is now described in detail for tiie case where the network input is two-dimensional. 

45 [0050] In the case where the input to the network is two-dimensional, projection methods can be used to relate the 
role of the hidden unit to the input space. Suppose the derivative of the ti-ansfer function is asymptotically zero. This 
restriction is not a severe one, and most of the transfer functions proposed for feedfonvard networks satisfy it. In fact 
since gradient descent methods use the derivative of the transfer function in calculatirig the charige to ttie weights, it is 
necessary for this condition to hold for the teaching mettiod not to be unstabla The fransfer function is ttien projected 

50 onto a subset of the input space, which is termed a geometi-ical primitive of the hidden unit. TTie following example 
shows how tills is done in the case of the sigmoidal t^ansfer fonction. 
[0051] The output of a hidden unit H is given by 

9(Xi,X2)=9(M'iX-i + W2X2+P) (5) 

S5 

where the input to the network is X=(x.| , Xg) . 

[0052] Fig. 4 shows an sample of such a hidden unit 8. The additional term p. tiie bias term, may be considered 
as an extra input to the network witii a constant value 1 . However, it is not essential that this bias term be present. 
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[0053] The object of dynamic interpretation is to capture tlie area of the Input space for which the hidden unit is 
most important (see area 20 of Rg. 3(a)). Intuitively, this is where the rate of change of the transfer function is greatiest. 
If the rate of change of the transfer function is very small, the hidden unit is just adding a constant term to the network 
rather than doing any important processing. The projections of the hidden units described here use this property. 
s [0054] The contribution of the hidden unit to the output unit is given by vqKX) where v is the connection weight to 
the output unit. Since the derivative of the transfer function is asymptotically zero, given a small positive value e, a region 
can be found for which 

„ vqfixXe (6) 

or equlvalentiy, for a small positive value a, a region can be found such that 

v(9(x)-fl)<a (7) 

or ■ 

Mb'ip{x))<a (8) 



20 where the range of the transfer function is [a,£i]. 

[0055] Such a region is in fact bounded by two lines (see Rg. 5), which can be found quite simply algebraically The 
following example uses the standard sigmoid function. ^ 

[0056] Consider a hidden unit with transfer function given by (5). TTiis has range [0,1]. Let f= +W2X2 + P. 
The n from (7) and (8) it is stralghtlbnward to show that (7) holds when:- 



30 and (8) holds when:- 

/>lOg(^) (10) 

35 ■ 

[0057] Hence the hidderi unit is most important |n the region 

40 



[0058] This region is bounded by two lines in the input space, given by (9) and (10) with the inequalities replaced 
by equalities. This type of indicator of the hidden units is referred to as a "geomeb-ical primitive" of the hidden unit. This 
projection can be used to give a rough idea of where the hidden unit is most important during the simulation. Altering 
45 the value of a adjusts the sensitivity of the method to detect the regiori over which the unit is operating. For example, to 
make sure that the region where units deviate from being constant by no more than 5%, a is chosen to be 0.05. Rg. 5 
shows an example 21 of this geometrical primitive. 

[0059] An important point is that the primitive can be determined using only information about the current state of 
the unit and its connecting weights. Hence it can be determined efficiently These projection lines can then be displayed 
50 on screen as part of the geometrical interpretation process. 

[0060] Dynamic interpretation for the general case is now described in detail. 
j0061] For a hidden unit, the contribution to the output unit is of the form. 

K 



8 



EP 0 583 217 B1 



where there may also be a bi^ term included as input to the unit. 

[0062] The set of examples which are being used to train the network come from a region of the input space. The 
boundaries of this region are determined before training commences. The boundaries form a rectangular region. For K 
inputs there are 2"^ vertices to the region. Suppose the region is 

ff=[ai;bi]x[a2,b2]x-x[a«-*/f] (13) 
[0063] To obtain an estimate of how important the hidden unit is, the following steps are carried out. 

A. The boundary points of the region are searched for the maximum and minimum of the hidden unit over them. 
This can be done very quickly if the transfer function is monotonic. The procedure to get the maximum is this: 

A1. Let ■ 



be the desired vertex. Now let 

Xi.,=(ai.a2 a^) 



Compare (|>(Xi ^) and ip(X2,i). If the former is less that the latter then x ^ , the first co-ordinate of x , is x., j . Oth- 
erwise it is Xaj. 

A2. Having found the first co-ordinate of the desired vertex, then let 
and 

X2,2 = a„t2 ojc) (17) 

as in the first stage, compare (p(xi ^ and <|){x:2,2). and assign a2 or 62 to i 2 as appropriate. 
A3. Repeat this procedure until X has been fully determined. 

A4. An analogous procedure can be used to determine the boundary point of the region for which the value of 
the transfer function is a minimum. 

B. In step A the maximum and minimum of the transfer f uncb'ons over the region of the input space of interest were 
calculated, and x^i„ say. These values used in conjunction witti the range of the transfer function give an esti- 
mate of the importance of the hidden unit. 

B1 . Suppose the transfer function has range [aiib]. Compute: 

- b-x^ . (18) 



The aim here is to estimate the maximum change in state of the network if the hidden unit is removed. To min- 
imise this change in state an appropriate constant may be added to ttie bias term of the output unit. Intuitively, 
if the transfer function has a practically constant value over the region of interest (c, say) it makes sense to 
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remove the hidden unit and add c to the bias term of the output unit. 

B2. The next step is to find the bigger of the quantities in (18) and (19). If it is (18), if replacing the hidden unit, 
add b to the bias term of the output unit. . 

Let: C=b-x„,„ (20) 

C is termed the "corrtrlbutory estimate" of the unit to the approximation. 

B3. if (1 9) is the bigger, the unit should be replaced by adding a to the bias term of the output unit. The contrib- 
utory estimate in this case is: 



B4. The contributory estimate of the hidden unit is now related to the estimate of change in state of the network 
by removing that unit. The total contribution of a hidden layer of N units to the output of the network is an 
expression of the form: 



SN(x) = tv,^!>.(x)+T ^22) 



[0064] Where x is the bias term of the output unit. Without loss of generality, by re-labelling if neceissary, it can be 
assumed that the Nth hidden unit is the one to be removed. If this unit is removed according to the above procedure, 
30 the new contribution is 



«W-v(x)=Xvi«'.('')+f+T 
where K is a or b depending on whether step 83 or B4 were followed. Note that: 



(23) 



(24) 



where C Is the contributory estimate of the Nth hidden unit. In addition, respective network, ffj{x) and f^.-iix) are related 
45 by: 

/w(x)-/w-i(x)<;iC (25) 

50 

where X Is the maximum, value of the derivative of ttie transfer function of the output unit. This relates to the contributory 
estimate of the hidden unit expllclfly to an upper txjund for the change in state induced by the removal of that unit. 
[0065] An Important point is that for a problem with an N dimensional input space only 2N evaluations of the hidden 
unit transfer function are needed to find the maximum and minimum of the transfer function over the region of Interest, 
55 despite the region having 2^ end points. This is because the transfer function is monotonic and therefore not all the end 
points of the region defined by (13) need be evaluated. This means that tiie information may be generated with little 
processing. Accordingly, this information can be calculated during the sinfiulatibn process and displayed as an aid to 
deciding when to remove hidden units. 
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[0066] Operation of the Static interpreter 17 is now described in detail. 

[0067] The step 4 of static interpretation provides more detailed information about the networl^. For example, tiie 
mean squared error of the network before and after the removal of a unit may be determined. Hence: 

M5£ = ^0'i-/.)^ (26) 



is calculated by presenting the M ta-aining examples, where y-, is the required value and /, is the network output. Note 
that the generalisation error can be investigated by calculating (26) for examples outside the training set but witiiin tiie 
scope of the required problem. 
[0068] The maximum norm en-or of the network 

= (27) 

gives the absolute value of the maximum en-pr ovei' all the training points. Again the use of such an error measure is 
not limited to the ti-aining set. 

[006^ These variables are related to the macroscopic performance of the network. In other words, they give infor- 
mation about the behaviour of the network by observing external characteristics of it. Tills infonnation may be used to 
delete a hidden unit, either automatically according to tiie static interpreter output, or in response to a user decision, as 
illustrated. 

[0070] The combination of dynamic interpretation followed by analysis and static interpretation and possible 
remiaval of hidden units on an on-going basis whereby training is interrupted only when more detail is required of certain 
hidden units and decisions are to be made regarding deletion of the unit or alteration of the structure generally Is very 
important. An Important feature is that after training of the network an optimised internal structure of the network Is 
achieved without the need to carry out further geometrical interpretation or analysis of the network. The only disadvan- 
tage is that training of the network takes slightly longer tiian would heretofore be the case. Because the structure of the 
network is optimised, efficiency of the network in use for the desired application is signif icantiy improved. 
[0071] It is not essential tiiat indicators arising frorn dynamic, or static, interpretatiori are displayed. For example, 
they may be simply stored and subsequently accessed by the processor for use in making an automatic decision as to 
whether or not ttie hidden unit should be deleted. 

[0072] Another aspect of the invention Is a manner in which the ti-aining or learning process may be improved, and 
indeed optimised. This aspect of the invention may be carried out alone, or in combination wrth the sb-ucture optimisa- 
tion steps. 

[0073] Heretofore, the learning method has been improved by observing the characteristics of the learning method 
following which an improved metiiod A is derived, using heuristics incorporating observations about tiie old learning 
method. 

[0074] In the invention improved learning is achieved by controlling different learning riiethods within the system. 
[0075] A high-level view of the system is shown in Fig. 6. It is described as follows. The system (which may be tiie 
processor 15) comprises a set or library of learning processors 29 for finding the weight values of the network. Several 
such processors are described above. Examples are tiiose operating according to the backpropagation and the global 
optimisation methods. One of these methods will be used to adjust the parameters of the network at any individual 
instant during learning. The system also comprises a learning manager component 30 which can dynamically alter 
which of the processors is used to update the parameters of the network 31 . This alteration is generated by control con- 
ditions or rules within the manager 30 which guide the alteration. The rules within the learning manager can be few or 
many, depending on the la/el of sophistication desired from the manager component 30. The rules within the manager 
can also be generic (incorporating information about general features of tiie learning method) or specific (pertaining to 
the performance of an algorithm on a specific problem or class of problem. 

[0076] The rules may be acquired in various ways. Firstly, previous simulations using a single I earning method may 
suggest overall features of that method. This will be discussed in more detail In tiie following example. Secondly, a 
method may have some computational or storage requirements that are known or can be estimated. Tliis information 
can be used to specify constraints for that method on tiie network sti-ucture on which it can be used. A simple learning 
manager rule of this type forbids a metiiod to be used if the number of weights of ttie network is over a certain fijffld 
value. 

[0077] Hence ttie role of ttie manager 30 is to select the processor that is most appropriate during learning, in some 
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specific sense as defined by the learning manager rules, and hence to improve learning. 

[0078] A specific embodiment is now described which addresses how these rules are arrived at. how they are used 
by the learning manager 30 to control the simulation, and the manager itself. 

[0079] During the use of the existing processors in the traditional manner of applying one process repeatedly until 
a convergence criterion is reached, useful information for the application of our more general method may be gathered 
by observing distinctive characteristics of the individual learning methods. Figure 7 shows an example. Suppose the 
graphs of Figure 7 represent the average performance of three learning processors on a particular problem. By observ- 
ing the features of the simulations, we can make the following statements 

Process A "fails to converge". 

Process B "displays good initial convergence, but poorfinal convergence". In other words, A perfornre well at the 
beginning of the simulation, but does not work well once this initial period of the simulation is over. 

Process C "is worse than B in the beginning of the simulation, but performs better during the final stages". 

[0080] With this simple set of observations, the issue arises as to how to translate them into a form which can be 
used to control the learning. We describe this translation, with a couple of simple examples below. 
[0081 ] Observations about the features of a given learning process are translated into a control condition. "Rie gen- 
eral form of a control condition is that it is some logical function of a statistic or measure of an attribute of the network 
which is used to monitor the learning method. Hence the general form of control condition can be expressed as: 

F(SiiS2...SK)e{0,1},S, = S|(N(w)) (28) 

where Si is some indicator that depends on the output of the network N with weights w. Such indicators include the 
mean squared error of the network, the number of units within the network, and the number of iterations of a learning 
method on the network. The value of the indicator is not limited to the current value, but may be also a function of stored 
previous values. 

[0082] The following are some examples of the translation of feature observations to control conditions: 

(1) the method is not converging. Hence the method should no longer be used if the mean squared error stops 
decreasing. A control condition for this is: 

£(f)-E(f-1)>a (29) 

where a is a constant representing the minimum acceptable change in the error change E(t) - E(t-1) brought about 
the algorithm at time t. Hence a represents a value 'close to 0", the exact value being decided by the experimenter. 
Therefore the algorithm which is used under this control condition will no longer be used when this condition eval- 
uates to FALSE. 

A more sophisticated version of the control condition uses a moving average of the error measure to make the 
condition less sensitive to small fluctuations in the performance of the learning method and more useful in detecting 
the overall trend. For example, a 10-cycle proving average would use: 

Md01>a : (30) 



which checks if the average change in the Error measure is "close to 0". If it is, there should be a switch to another learn- 
ing method. 

(2) The learning method should not be used if the networlt is large: This condition may be necessary for certain 
learning methods because of storage requirements that are excessive if the network becomes large, or because 
the computation scales badly as a function of network size. Second order learning methods (such as the BFGS 
method) are an example of both of these features. If W is the number of weights within the network, then the Hes- 
sian matrix of second order derivatives is of size W^, and the number of computational steps needed to calculate 
is also of order o(W2). Control conditions for this situation are clearly given by bounds on the number of weights or 
units within the network. This is seen to be a control condition since the number of units or weights within a network 
can be viewed as a statistic or attribute of the network. 
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[0083] The following is an example of the method. In this example the learning manager controls the use of two 
learning methods, method A being a first order gradient descent method (e.g.. backpropagation) and method B being 
based on second order gradients (e.g. the BFGS method). Method A uses a step-size parameter. For reasons men- 
tioned above, optimal choice of this parameter Is difficult. Here we assume that a reasonable choice of parameter is 
used, which gives acceptable but not optimal performance.. 

[0084] Based on observation, the following features have been observed about the processes: 

(1) Method A requires less processing power during the initial stages of the simulation and thus is preferred. 

(2) Method B has better convergence in the later periods of learning, where the extra computational requirement is 
justHied. 

[0085] Part (1) can be easily translated into a high-level control conditions for the learning manager, namely, "ini- 
tially, use A". 

[0086] Part (2) translates as "after using A, use B". 

[0087] A control condition to determine when to switch processes is based on the dynamics of the learning. Such 
control conditions are used to detect situations when features associated with a given learning method hold. In this 
embodiment the feature is that process A is no longer converging. This is achieved using the moving average detector 
of the equation (30). 

[0088] Figure 8 shows the complete example. The example shows the rules as they might appear within a system 
as a set of commands. Firstly, the moving average control conditioh is defined. Then, the control condition is attached 
to metfiod A. When method A runs, this control condition will act as a monitor to decide when it should be stopped. TTie 
final two lines apply method A and then B. The method of transfer from A to B is determined by the control condition. 
There are other control conditions which may be used. For e)ample, the conditton "stop the simulation when the mean 
squared error is small" could be used to both processes A and B. Therefore the simulation will be stopped when this 
becomes true. Note that this may mean that learning method Bis not used, if A is sufficient. The important point is that 
a control condition Is used in deciding what to do next. This may be to switch learning processes, or to end learning. 



1 . A training method for a feedforaard neural network (6) having hidden units (8) comprising the steps of transmitting 
input stimuli to the network and adjusting connection weights in response to monitoring the network output signals 
for that training method, the steps including dynamically interpreting (2) the network performance during training by 
interpretation of the transfer function of an individual hidden unit (8) using the immediately connecting weights of 
that hidden unit (8); continuously generating a dynamic indicator (20) of that network performance; comparing the 
dynamic indicator (2) to a desired dynamic indicator of performance; and interrupting the training method when the 
dynamic indicator (20) falls below the desired dynamic indicator of performance; characterised in that the method 
comprises the additional st^s when training is inten-upted of : 

generating a static indicator of the hidden units (8) performance by carrying out a static interpretation of the 
overall network performance with and without the hidden unit; and 

altering the network internal structure in response to the static interpretation of the performance of the hidden 
unit. 

2. A training method as claimed in claim 1 , wherein continuous and monotonic transfer functions having aisymptoti- 
cally zero derivatives are interpreted. 

3. A training method as claimed in any preceding claim Wherein the dynamic indicator (20) is geometrical and is dis- 
played. 

4. A training method as claimed in any preceding claim, wherein both dynamic and static interpretation involve relating 
operation of a hidden unit to the input data. 

5. A tiBining method as claimed in any preceding claim wherein global information relating to network performance is 
interpreted during static interpretation (4). 

6. A training mettiod as claimed in any preceding claim comprising ttie f urttier steps of: 
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storing characteristics of different training methods (29): 

selecting an initial training method (29) and can-ying out training according to the method (29); ' 
dynamically monitoring a, feature of the training method (29); 
evaluating the monitored feature according to a control condition; and 

selecting a different training method (29) for subsequent training stages according to the control condition. 

7. A training method as claimed in claim 6, wherein a plurality of control conditions are used for evaluating monitored 
features, a control condition being specif ic to a training method, and another control condition being specific to all 
methods. 

8. An apparatus for a training method of a feedforward neural network (6) having hidden units (8), the apparatus com- 
prises means (15) for transmitting stimuli to the network (6); means (15) for adjusting connection weights in 
response to monitoring the network output signals, means (1 6) for dynamically interpreting performance of the net- 
work (6) during training by interpretation of the transfer function of an individual hidden unit (8) using the immedi- 
ately connecting weights of that hidden unit (8); means for continuously generating a dynamic indicator (20) of the 
network performance; means for conparing the dynamic indicator (20) to a desired dynamic indicator of perform- 
ance; and means for interrupting the training method when the dynamic indicator (20) falls below the desired 
dynamic indicator of performance, characterised in that the apparatos further comprises: 

communications means between the dynamic indicator (20) and a static interpreter (17); 

means in the static interpreter (1 7) for generating a static indicator of the hidden units (8) performance by car- 
rying out a static interpretation of the overall network performance with and without the hidden unit when the 
training is interrupted; and 

means for altering the network structure in response to the static interpretation of the performance of the hid- 
den unit. 

9. An apparatus for implementing a training method as claimed in claim 8, including: 

a plurality of training methods (29); and 

the apparatus further comprises a training controller (30) comprising: 

means for storing characteristics of the training methods; 

means for initiating training of the network 

according to a selected training method; 

means for dynamically monitoring a training feature; 

means for evaluating the monitored feature according to a control condition; and 

means for selecting a different training method or for subsequent training stages according to the control con- 
dition. 

Patentanspriiche 

1 . Trainingsverfahren fur ein neuronales Vorwflrtsnetzwerk (6) mit versteckten Einheiten (8), umfassend die folgenden 
Schritte: Ubertragen von Eingangsstimuli zu dem Netzwerk und Justieren von Verbindungsgewichtungen als Reak- 
tion auf die Ubenwachung der Netzwerkausgangssignale fur dieses Trainingsverfahren, wobei die Schritte folgen- 
des beinhalten: dynamisches Interpretieren (2) der NetzwerWeistung wahrend des Trainings durch Interpretation 
der Transferfunktion einer individuellen versteckten Einheit (8) anhand der sofort verbindenden Gewichtungen die- 
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ser versteckten Einheit (8); kontinuierliches Erzeugen eines dynamischen Indikators (20) dieser NetzwerMeistung; 
Vergleichen des dynamischen Indikators (2) mit einem gewunschten dynamischen Leistungsindikator; und Unter- 
brechen des Trainingsverfahrens, wenn der dynamlsche Inditetor (20) unter den gewQnschten dynamischen Lei- 
stungsindikator failt; dadurch gekennzeichnet. da3 das Verfahren bei Unterbrechung des Trainings die folgenden 
5 . zusatzlichen Schritte umfaBt: 

Erzeugen eines statischen Indikators der versteckten Leistungseinheiten (8) durch Durchfuhren einer stati- 
schen Interpretation der gesamten Netzwerkleisitung mit der und ohne die versteckte(n) EInhat; und 
Andern der internen Netzwisrkstruktur als Reaklion auf die statische Irrterpretation der Ijeistung der versteck- 

10 ten Einheit. 

2. Trainingsver^hr^n nach Anspruch 1 , bei dem kontinuierliche und monotone Transferfunktionen mit asyrrptotischen 
Nullableitungen irrterpretiert werden. . 

75 3. iVainingsverfahren nach einem der vorherigen AnsprQche, bei dem der dynamische Indikator (20) geometrisch ist 
und angezeigt wird. 

4. Trainingsvertahren nach einem der vorherigen AnsprQche, bei dem die dynamische und die statische Interpretation 
einen Vergleichsvorgang zwischen einier versteckten Einheit und den Eingangsdaten beinhaltet. 

5. Trainingsverfahren nach einern der vorherigen AnsprQche, bei dem globale Informationen Ober die NetzwerMei- 
stung wahrend der statischen Interpretation (4) interpretiert werden. 

6. Trainingsverfahren nach einem der vorherigen AnsprQche, ferner umfassend die folgenden Schritte: 

25 

Speichern der Charakteristiken unterschiedlicher Trainingsverfahren (29); 

Wahlen eines ersten Trainingsverfahrens (29) und DurchfQhren des Trainings gemSB dem Verfahren (29) ; 
dynamisches Dberwachen eines Merkmals des Trainingsverfahrens (29); 

Beurteilen des Qberwachten Merkmals gemSB einem Steuerzustand; und Wahlen eines anderen Trainingsver- 
30 fahrens (29) fur nachfolgendeTrainingsphasengemaS dem Steuerzustand. 

7. Trainingsverfahren nach Anspruch 6, bei dem eine Mehrzahl von Steuerzustanden zum Beurteilen Qbenwachter 
Merkmaie venwendet wird, wobei ein Steuerzustand spezifisch fQr ein Trainingsverfahren und ein anderer Steuer- 
zustand spezitisch fur alle anderen Verfahren ist. 

8. Vorrichtung fOr ein Trainingsverfahren eines neuronalen Vonwartsnetzwerkes (6) mit versteckten Einheiten (8), 
wobei die Vorrichtung fblgendes umfaBt: ein Mittel (15) zum Obertragen von Stimuli zu dem Netzwerk (6); ein Mittel 
(15) zum Justieren von Verbindungsgewichtungen als Reakb'on auf die Obenvachung der Netzwerkausgangssi- 
gnale; ein Mittel (16) zum dynamischen Inteiprelieren der Leistung des Netzwerkes (6) wahrend des Trainings 

40 durch Interpretation der TransferfunWion einer individuellen versteckten Einlieit (8) anhand der sofbrtverbindenden 
Qewichtungen dieser versteckten Einheit (8); ein Mittel zum kontinuieriichen Erzeugen eines dynamischen Indika- 
tors (20) der Netzwerkleistung; ein Mittel zum Vergleichen des dynamischen Indikators (20) mit einem gewQnsch- 
ten dynamischen Leistungsindikator; und ein Mittel zum Unterbrechen des Trainingsverfahrens, wenn der 
dynamische Indikator (20) unter den gewunschten dynamischen Leistungsindikator failt, dadurch gekennzeichnet, 

45 daB die Vorrichtung ferner folgendes umfaBt: 

ein Kommunikationsmittel zwischen dem dynamischen Indikator (20) und einem statischen Interpretierer (17); 
ein Mittel in dem statischen Interpretierer (17) zum Erzeugen eihes statischen Indikators der Leistung der ver- 
steckten Einheften (8) durch DurchfQhren einer statischen Interpretation der gesamten Netzwerkleistung mit 
50 der und ohne die versteckte(n) Einheit. wenn das Training unterbrochen wird: und 

ein Mittel zum Andern der Netzwerkstruktur als Reaktion auf die statische Interpretation der Leistung der ver- 
steckten Einheit. 



9. Vorrichtung zum Ausfuhren eines Trainingsverfahrens nach Anspruch 8, umfassend: 
eine Mehrzahl von Trainingsverfahren (29); und 

wobei die Vorrichtung ferner einen Trainings-Controller (30) umfeBt, der fblgendes aufweist: 
ein Mittel zum Speichern von Charakteristiken der Trainingsverfahren; 
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ein Mtttel zum Einleiten von Training des Netzwerl«s gemaB einem gewalilten Jrainingsverfeihren; 

ein Mittel zum dynamischen Oberwachen eines Trainingsmerlonais; 

ein Mittel zum Beurteilen des QberwacMen Merkmals gemai3 einem Steuierzustand; und 

ein Mittel zum Wflhlen eines anderen Trainingsverfahrens Oder fQr nacfifolgende Trainingsphasen gemaB dem 

Steuerzustand. 

Revendications 

1 . IVI^tliode d'apprentissage pour un r§seau neuronal ^ reaction anticipative (6) ayant des unites cach§es (8) compre- 
nant les stapes consistant ^ transmettre des excitations d'entr^e au r^seau et d ajuster des pond^rations de con- 
nexion en r^nse au contrdle des signaux de sortie du r^srau pour cette m^ode d'apprentissage, les 6tapes 
incluant interpreter dynamiquement (2) la perfiarmance du r6seau durant I'apprentissage par l'interpr§tation de la 
fbnction de transfert d'une unit§ cachde individuelle (8) en utilisant les pond^rations de connexion immediate de 
cette unit6 cach6e (8); g6n§rer continuellement un indicateur dynamique (20) de la performance de ce rfeeau; 
comparer I'indicateur dynamique (2) avec un indicateur dynamique d6sir§ de performance; et interrompre la 
m6thode d'apprentissage lorsque I'indicateur dynamique (20) tombe en dessous de I'indicateur dynamique d6sir6 
de performance; caract6ris6e en ce que, lorsqUe I'apprentissage est interrompu, la m6thode comprend les 6tapes 
suppl6mentaires consistant k: 

g6n6rer un indicateur statique de la performance de I'unit6 cach6e (8) en r6alisant une Interpretation statique . 
de la performance globale du r6seau avec et sans I'unitd cachte; et 

altdrer la structure interne du r6s^u en riponse k Tinterprdtation statique de la performance de I'unitd cach^e. 

2. M§thode d'apprentissage telle que revendiqude dans la revendication 1, dans laquelle les fonctions de transfert 
continues et monotones ayant des d6riv6es asymptotiquement de z§ro, sont interpr§t§es. 

3. M6thode d'apprentissage telle que revendiqu6e dans I'une quelconque des revendications pr6c6dentes, dans 
laquelle I'indicateur dynamique (20) est g6om6trique et est affichi6. 

4. M6thode d'apprentissage telle que revendqu6e dans I'une quelconque des revendications pr6c6dentes, dans 
laquelle les interprdtations dynamique et statique comprennent la mise en relation de I'opdratipn d'une unite 
cachde avec les donndes d'entrde. 

5. Methode d'apprentissage telle que revendiquSe dans I'une quelconque des revendications prScedentes, dans 
laquelle les informations globales se rapportant k la performance du r^seau sont interpr6t§es durant linterprdtation 
statique (4). 

6. Methode d'apprentissage telle que revend'qu^e dans I'une quelconque des revendications prteedentes, compre- 
nant les 6tapes ult6rieures consistant 

memoriser les caractdristiques dedifferentes methodes d'apprentissage (29); 

s§lectioriner une m^thode d'apprentissage inrtiale (29) et exScuter I'apprentissage selon la mdthode (29); 

contraier dynamiquement une caract6rlstique de la m6thode d'apprentissage (29); 

^valuer la caractdristique contrOlde selon une condition de contrOle;.et 

s6lectionner une mdthode d'apprentissage diff6rente (29) pour des dtapes d'apprentissage subs6quentes 
selon la condition de contraie. 

7. M6thode d'apprentissage telle que revendiqu6e dans la revendication 6, dans laquelle une pluralit§ de conditions 
de controle sont utilis6es pour 6valuer des caractdristiques contr6l6es, une condition de contrOle.dtant spddfique 
k une m6thode d'apprentissage, et une autre condition de contrSle etant sp§cifique k toutes les m6thodes. 

8. Appareil pour une m^thode d'apprentissage d'un r6seau neuronal k reaction anticipative (6) ayant des unites 
cach6es (8). I'appareil comprend un moyen (15) pour transmettre des excitations au r6seau (6); un moyen (15) 
pour ajuster des pond6rations de connexion en r6ponse au contrSle des signaux de sortie du r6seau, un moyen 
(16) pour interpreter dynamiquement la performance du r§seau (6) pendant I'apprentissage par interpretation de 
la foncBon de transfert d'une unite caches individuelle (8) en utilisant les ponderations de connexion immediate de 
cette unite cach6e (8); un moyen pour g6n6rer continuellement un indicateur dynamique (20) de la performance du 
reseau; un moyen pour comparer I'indicateur dynamique (20) avec un indicateur dynamique desire de perfor- 
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mance; et un moyen pour interrompre la m§thode d'apprentissage lorsque I'indicateur dynamique (20) tombe en 
dessous de llndicateur dynamique d6sir6 de performance. caract^is§ en ce que I'appareil comprend en outre: 

un moyen de communication entre I'indicateur dynamique (20) et un interprdteur statique (17); 
un moyen dans I'interprdteur statique (17) pour g^n^er un indicateur statique de la performance de I'unitd 
cachde (8) par la mise en oeuvre d'une interpr^tion statique de la performance globale du rdseau, avec et 
sans I'unitd cach§e lorsque I'apprentissage est interrompu; et 

un moyen pour altSrer la structure de rSseau eh rdponse d I'InterprStation statique de la performance de I'unit^ 
cach^. 

9. Appareil pour mettre en oeuvre une m^thode d'apprentissage tel que revendiquS dans la revendication 8, induant: 

une pluralit6 de m6thodes d'apprentissage (29); et 

I'appareil comprend en outre un contraieur d'apprentissage (30) comprenant: 

un moyen pour m6morlser des caract6ristiques des m6thodes d'apprentissage; 

un moyen pour amorcer I'apprentissage du r^seau selon une m^hode d'apprentissage sdlectionn^e; 

un moyen pour contraier dynamiquement une caract^ristique d'apprentissage; 

un moyen pour ^valuer la caract^ristique contrdl^e selon urie condition de contrSle; et 

un moyen pour s6lectionner une m6thode d'apprentissage diff6rente ou pour des 6tapes d'apprentissage sub- 

s6quentes selon la condition de contreie. 
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